14.4 The Cell Cycle

187

Fig. 14.3 Simplified schematic diagram of eukaryotic gene structure. A is the antiparallel double

helix. Rectangles represent genes and dashed lines represent intergenomic sequences. B is an expan-

sion of a (a single gene) in A. The shaded rectangles correspond to DNA segments transcribed into

RNA, spliced, and translated continuously into proteins. p is a promoter sequence. In reality, this

is usually more complex than a single nucleotide segment; it may comprise a sequence to which an

activator protein can bind (the promoter site proper) but, also, more distant (“upstream” from the

gene itself), one or more enhancer sites to which additional transcription factors (TF) may bind. All

of these segments together are called the transcription factor binding site (TFBS). There may be

some DNA of indeterminate purpose between p and the transcription start site (TSS) marked with

an arrow. Either several individual proteins bind to the various receptor sites, and are only effective

all together, or the proteins preassociate and bind en bloc to the TFBS. In both cases, one anticipates

that the conformational flexibility of the DNA is of great importance in determining the affinity of

the binding. To the right of the TSS: shaded regions, exons; unshaded regions, introns

genome, which does not exceed 2 times 10 Superscript 82 × 108 base pairs). Another kind of repetition

occurs as the duplication in the sense of further duplication of whole genes along the

chromosome (or on another chromosome). The apparently superfluous copies tend

to acquire mutations, vitiating their ability to be translated into a functional protein,

whereupon they are called pseudogenes. 24 Gene duplication may be considered as

a form of cellular computing. 25 In the human being, satellite sequences of repeti-

tive DNA alone constitute about 5% of the genome; in the horse, they constitute

about 45%. Telomere sequences are further examples of repetitive DNA (in humans,

TTAGGG is repeated for 3–20 kilobases). Between the telomere and the remainder

of the chromosome there are 100–300 kilobases of telomere-associated repeats.

14.4.3

The C-Value Paradox

Well before genome sequence information became available, it was clear that the

amount of DNA in an organism’s cells (the C-value; more precisely it is the mass

of DNA within a haploid nucleus) did not correlate particularly well with the organ-

ism’s complexity, and this became known as the “C-value paradox”. Examination of

24 For a concrete example, see Hittinger and Carroll (2007).

25 Shapiro (2005).